Dataset Distillation (DD), a newly emerging field, aims at generating much smaller and high-quality synthetic datasets from large ones. Existing DD methods based on gradient matching achieve leading performance; however, they are extremely computationally intensive as they require continuously optimizing a dataset among thousands of randomly initialized models. In this paper, we assume that training the synthetic data with diverse models leads to better generalization performance. Thus we propose two \textbf{model augmentation} techniques, ~\ie using \textbf{early-stage models} and \textbf{weight perturbation} to learn an informative synthetic set with significantly reduced training cost. Extensive experiments demonstrate that our method achieves up to 20$\times$ speedup and comparable performance on par with state-of-the-art baseline methods.
translated by 谷歌翻译
This paper studies how to flexibly integrate reconstructed 3D models into practical 3D modeling pipelines such as 3D scene creation and rendering. Due to the technical difficulty, one can only obtain rough 3D models (R3DMs) for most real objects using existing 3D reconstruction techniques. As a result, physically-based rendering (PBR) would render low-quality images or videos for scenes that are constructed by R3DMs. One promising solution would be representing real-world objects as Neural Fields such as NeRFs, which are able to generate photo-realistic renderings of an object under desired viewpoints. However, a drawback is that the synthesized views through Neural Fields Rendering (NFR) cannot reflect the simulated lighting details on R3DMs in PBR pipelines, especially when object interactions in the 3D scene creation cause local shadows. To solve this dilemma, we propose a lighting transfer network (LighTNet) to bridge NFR and PBR, such that they can benefit from each other. LighTNet reasons about a simplified image composition model, remedies the uneven surface issue caused by R3DMs, and is empowered by several perceptual-motivated constraints and a new Lab angle loss which enhances the contrast between lighting strength and colors. Comparisons demonstrate that LighTNet is superior in synthesizing impressive lighting, and is promising in pushing NFR further in practical 3D modeling workflows. Project page: https://3d-front-future.github.io/LighTNet .
translated by 谷歌翻译
单视点云完成旨在仅基于有限的观察结果来恢复对象的完整几何形状,这由于数据稀疏性和遮挡而非常困难。核心挑战是生成合理的几何形状,以基于部分扫描的局部扫描填充对象的未观察到的部分,该部分受限制不足,并且具有巨大的解决方案空间。受计算机图形中经典的影子音量技术的启发,我们提出了一种有效减少解决方案空间的新方法。我们的方法认为摄像机是向物体投射射线的光源。这样的光线建立了一个合理的约束但表达式的基础,以完成。然后将完成过程作为点位移优化问题进行配制。点在部分扫描处初始化,然后将每个点的两种运动类型移至目标位置:沿光线射线的方向运动和限制局部运动以进行形状细化。我们设计神经网络以预测理想点运动以获得完成结果。我们证明,通过详尽的评估和比较,我们的方法是准确,健壮和可推广的。此外,在MVP数据集上,它在定性和定量上优于最先进的方法。
translated by 谷歌翻译
基于注意力的模型已在许多领域(例如计算机视觉和自然语言处理)广泛使用。但是,尚未深入探索时间序列分类(TSC)中的相关应用,导致大量TSC算法仍然遭受注意机制的一般问题,例如二次复杂性。在本文中,我们通过提出灵活的多头线性注意力(FMLA)来促进注意机制的效率和性能,从而通过与可变形的卷积块和在线知识蒸馏来提高局部意识。更重要的是,我们提出了一种简单但有效的遮罩机制,有助于减少时间序列中的噪声影响,并通过按比例掩盖每个给定序列的某些位置来减少所提出的FMLA的冗余。为了稳定这种机制,将样品通过随机掩模层几次转发,并将其输出聚合以使用常规掩码层教相同的模型。我们在85 UCR2018数据集上进行了广泛的实验,以将我们的算法与11个知名算法进行比较,结果表明,我们的算法在TOP-1准确性方面具有可比性的性能。我们还将模型与三个基于变压器的模型相对于每秒的浮点操作和参数数量进行了比较,并发现我们的算法在较低的复杂性方面可显着提高效率。
translated by 谷歌翻译
本文提出了一种有效的联邦蒸馏学习系统(EFDLS),用于多任务时间序列分类(TSC)。 EFDL由中央服务器和多个移动用户组成,不同的用户可能运行不同的TSC任务。 EFDLS有两种新型组件,即基于特征的学生 - 教师(FBST)框架和基于距离的权重匹配(DBWM)方案。在每个用户中,FBST框架通过知识蒸馏将教师隐藏层的知识转移到学生的隐藏层,与具有相同网络结构的教师和学生。对于每个连接的用户,其学生模型的隐藏层的权重定期上传到EFDLS服务器。 DBWM方案部署在服务器上,具有最小的方距离,用于测量两个给定模型的权重之间的相似性。该方案为每个连接的用户找到合作伙伴,使得用户及其伴侣的权重是上载的所有权重中最接近的权重。服务器交换并将其伙伴的权重发送给这两个用户,然后将所接收的权重加载到其教师隐藏的层。实验结果表明,所提出的EFDLS在一组选择的UCR2018数据集上实现了卓越的性能,这是一个精度的精度。
translated by 谷歌翻译
深度神经网络令人惊奇地遭受数据集偏见,这对模型鲁棒性,泛化和公平性有害。在这项工作中,我们提出了一个两级的脱扎方案,以防止顽固的未知偏差。通过分析有偏置模型的存在的因素,我们设计了一种小说学习目标,通过依赖单独的偏见,无法达到。具体而言,使用所提出的梯度对准(GA)实现了脱叠模型,该梯度对准(GA)动态地平衡了偏置对齐和偏见冲突的样本的贡献(在整个整个训练过程中,在整个训练过程中,强制执行模型以利用内部提示进行公平的决定。虽然在真实世界的情景中,潜在的偏差非常难以发现并对手动标记昂贵。我们进一步提出了通过对等挑选和培训集合来提出自动偏见冲突的样本挖掘方法,而无需先前了解偏见信息。各种数据中的多个数据集进行的实验表明了我们拟议计划的有效性和稳健性,该计划成功减轻了未知偏差的负面影响,实现了最先进的性能。
translated by 谷歌翻译
As a crucial robotic perception capability, visual tracking has been intensively studied recently. In the real-world scenarios, the onboard processing time of the image streams inevitably leads to a discrepancy between the tracking results and the real-world states. However, existing visual tracking benchmarks commonly run the trackers offline and ignore such latency in the evaluation. In this work, we aim to deal with a more realistic problem of latency-aware tracking. The state-of-the-art trackers are evaluated in the aerial scenarios with new metrics jointly assessing the tracking accuracy and efficiency. Moreover, a new predictive visual tracking baseline is developed to compensate for the latency stemming from the onboard computation. Our latency-aware benchmark can provide a more realistic evaluation of the trackers for the robotic applications. Besides, exhaustive experiments have proven the effectiveness of the proposed predictive visual tracking baseline approach.
translated by 谷歌翻译
通常在具有固定预定义类别的完全注销的培训数据上学习对象探测器。但是,通常需要逐步增加类别。通常,在这种情况下,只有用旧课程注释的原始培训集和一些带有新课程的新培训数据。基于有限的数据集,强烈需要一个可以处理所有类别的统一检测器。我们提出了一个实用计划,以实现这项工作。无冲突的损失旨在避免标签歧义,从而在一次训练中导致可接受的探测器。为了进一步提高性能,我们提出了一个重新培训阶段,其中采用蒙特卡洛辍学术来计算定位置信度,以挖掘更准确的边界框,并提出了一种重叠的加权方法,以更好地利用在重新训练期间更好地利用伪注释。广泛的实验证明了我们方法的有效性。
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译
Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps.
translated by 谷歌翻译